Biostatistics For Dummies (Monika Wahi John Pezzullo)

Chapter 17

More of a Good Thing: Multiple Regression

IN THIS CHAPTER

Understanding what multiple regression is

Preparing your data and interpreting the output

Understanding how interactions and collinearity affect regression analysis

Estimating the number of participants you need for a multiple regression analysis

Chapter 15 introduces the general concepts of correlation and regression, two related techniques for

detecting and characterizing the relationship between two or more variables. Chapter 16 describes the

simplest kind of regression — fitting a straight line to a set of data consisting of one independent

variable (the predictor) and one dependent variable (the outcome). The formula relating the predictor

to the outcome, known as the model, is of the form

, where Y is the outcome, X is the

predictor, and a and b are parameters (also called regression coefficients). This kind of regression is

usually the only one you encounter in an introductory statistics course, because it is a relatively simple

way to do a regression. It’s good for beginners to learn!

This chapter extends simple straight-line regression to more than one predictor — to what’s called the

ordinary multiple linear regression model, or more simply, multiple regression.

Understanding the Basics of Multiple Regression

In Chapter 16, we outline the derivation of the formulas for determining the parameters of a straight

line so that the line — defined by an intercept at the Y axis and a slope — comes as close as possible

to all the data points (imagine a scatter plot). The term as close as possible is operationalized as a

least-squares line, meaning we are looking for the line where the sum of the squares (SSQ) of vertical

distances of each point from to the line is the smallest. SSQ for a fitted line is smallest for the least-

squares line than for any other line you could possibly draw.

The same idea can be extended to multiple regression models containing more than one predictor

(which estimates more than two parameters). For two predictor variables, you’re fitting a plane,

which is a flat sheet. Imagine fitting a set of points to this plane in three dimensions (meaning you’d be

adding a Z axis to your X and Y). Now, extend your imagination. For more than two predictors, in

regression, you’re fitting a hyperplane to points in four-or-more-dimensional space. Hyperplanes in

multidimensional space may sound mind-blowing, but luckily for us, the actual formulas are simple

algebraic extensions of the straight-line formulas.

In the following sections, we define some basic terms related to multiple regression, and explain when

you should use it.

Defining a few important terms